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Abstract 

This paper shows that Physics is very close to the substitution- 
diffusion paradigm of symmetric ciphers. Based on this analogy, we 
propose a new cryptographic algorithm. Statistical Physics gives de- 
sign principles to devise fast, scalable and secure encryption systems. 
In particular, increasing space dimension and considering larger data 
blocks improve both speed and security, allowing us to reach high 
throughput (larger than lOGb/s on dedicated HW). The physical ap- 
proach enlarges the way to look at cryptography and is expected to 
bring new tools and concepts to better understand and quantify secu- 
rity aspects. 



1 Introduction 

Symmetric cryptography consists in transforming a plain text message M 
into a cipher text M' through a series of operations that can be inverted 
by the recipient of the encoded text. The transformation is specified by a 
secret key K which is shared by the sender and the receiver (see fig. 1) This 
transformation is usually considered in a purely mathematical framework, 
with no reference to any physical process. Yet, Shannon [8, 9] describes the 
generic steps of a cryptosystem in terms of the repetition of diffusion and 
confusion operations (see fig. 1. Diffusion is a well known physical process 
whose microscopical origin may be associate with random walk. It seems 
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Figure 1: The standard structure of a symmetric block cipher. 

however that the contribution of physics to classical cryptography has been 
only to provide some vocabulary but no design principles. The few physical 
devices that have been proposed to encode a message are usually rather exotic 
and their security is hard to certify [6] 

It should be noted that here we refer to cryptosystems using classical 
transmission methods, i.e. bits of information traveling along an electrical 
line. We exclude the contribution of quantum physics for which security 
does not rely on the difficulty of transforming M' back into M but on the 
impossibility to intercept the information. 

Thus, except from quantum mechanics, the contribution of physics to 
cryptography has been surprisingly little. However, the second principle of 
thermodynamics gives some interesting insights: the laws of physics are such 
that in a closed system (e.g. a gas in a container), time evolution always 
produces an increase of entropy (i.e. decrease of information). In other 
words, all configurations evolve to a similar final state, in which there seems 
to be no more memory of the initial situation. 

This process is therefore a good encryption mechanism: the final con- 
figuration reveals nothing about its origin. But what about deciphering ? 
Clearly, if there is no way to go back to the original configuration, our 
methodology has little interest. It is a well known paradox of classical physics 
that time always evolves to the future and never to the past, despite the fact 
that the microscopic laws of physics are fully symmetrical with respect to 
past and future. There should therefore be a way to come back. It is how- 
ever highly impracticable. Indeed, according to Newton's laws of mechanics, 
one would have to reach every single particles of the system and to exactly 
reverse their microscopic velocity. How can we ever do such a thing in a real 
physical system ? There is no way to act on each particle and there is no 
way to know with full accuracy its current velocity, not to mention that the 
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smallest possible error we could make in this reversing process will prevent 
the system to evolve back to its initial state. Therefore, errors are an efficient 
way to prevent message decryption. Provided that we could control them, 
errors thus offer a way to implement a secrete key. 

Fortunately, we can export the above ideas in a framework where both 
macroscopical irreversibility and microscopical reversibility co-exist. This is 
the field of discrete physics exemplified by the so-called Lattice-Gas- Automata 
(LGA) approach described in section 2. In a discrete physical world, one can 
act on each particle with full accuracy, allowing us to run the system forward 
and backward in time and to add well controlled errors. 

This paper is organized as follows: we first define the LGA paradigm and 
illustrate its behavior with respect to the second principle of thermodynam- 
ics. Then we propose a complete cryptographic algorithm based of the LGA 
approach in which Physics naturally suggests simple and efficient operations, 
as well as a criteria to generalize the algorithm to a wide range of topologies 
and sizes. We provide some evidence of the security (evolution of Hamming 
distance, flatness of XOR table, computational effort to break the key with 
differential cryptography). We finally conclude by insisting on the deep and 
promising link that exist between physics and cryptography. 

2 Discrete physics and Lattice-gas Automata 

Lattice-gas automata (LGA) are a special class of cellular automata (CA) 
designed to provide a mesoscopic model of a physical system, such as a gas 
or a fluid [1]. LGA can be thought of as a virtual universe implementing 
a fully discrete abstraction of the real world. Technically speaking, these 
automata consist of a N point-particles moving on a regular lattice in d 
spatial dimensions, according to a discrete time steps. The possible velocities 
of each particle are restricted by the lattice topology in the sense that, in one 
time step At, a particle move from one site to one of its existing neighbor. 
Thus, if z is the lattice coordination number, particles may have z possible 
velocities denoted Vi, i = 1, z. Figure 2 illustrates the situation. 

Let us denote by rii(f,t) the number of particles entering lattice site r at 
time t with velocity v^. Assuming that these particles are not deflected at site 
r, they will, at the next time step, enter the neighboring site pointed by Vi, 
with the same velocity. In other words, rij(f + ViAt,t + At) = rii(f, t). This 
operation is called propagation and can be globally described by an operator 
P. If M(t) is a configuration of particles over the full lattice, then 

M(t + At) = PM{t) 
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Figure 2: An example of a LGA, the so-called FHP model [3] defined on a 
hexagonal lattice where each site has z = 6 neighbors. 

represents the motion of all particles at every sites. Note that if the lattice is 
not periodic in all direction, a special treatment is required for the particles 
reaching the spatial boundaries. 

We shall now assume that n« can be either or 1 (no particle, or at 
most one with velocity Vi at site r and time t). Also, we assume that the 
particles entering the same site at the same time from different direction (i.e. 
different velocities) interact according to a pre-defined collision rule. The 
result of this collision is to create new particles in some directions and to 
remove some particles in other directions, as illustrated in fig. 2. 

The collision process can be described by functions Cj(ni,n 2 , ...,n z ), i = 
1, z, taking the value 1 if the interaction produces a particle with velocity 
Vi and otherwise. After the collision, the particles move to one of their 
neighboring site, according to the velocity they have. Thus 

m(f+AtVi,t + At) = Ci(ni(r,t),ri2(r,t),...,n z (r,t)) (1) 

The fact that Cj can be either or 1 guarantees that the occupation numbers 
rii obey our hypothesis of being or 1 at all times and all lattice positions. 
Therefore, the full dynamics can be described by a set of z bits of information 
at each lattice site and each time steps. 

As before, we can consider that C is an operator which acts on all lattice 
sites. Thus if M{t) is the configuration at time t, eq. 1 becomes 

M(t + At) = PCM{t) 

where P is the propagation operator. 

Now, if we want our Boolean particles to behave macroscopically as ob- 
served in a real physical system, the collision operator C must be carefully 
chosen. Its expression will crucially depends on the nature of the physical 
process. For instance, it can be shown that, in the appropriate macroscopic 
limit, fluid motion can be reproduced by such a LGA. This is the case of 
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the famous FHP model (2d) or FCHC models (3d) (see [1]) which can be 
shown to obey the Navier-Stokes equation of hydrodynamics. The proof of 
this equivalence is mathematically rather involved [1]. Intuitively, the LGA 
is a caricature of the microscopic level but, at a macroscopic level, it behaves 
as a real system. Over the past decade, this property has been intensively 
exploited by many researchers to develop a new generation of hydrodynamics 
solvers, such as the so-called Lattice Boltzmann method which keeps gaining 
in popularity to model and simulate complex fluids. 

When using this physical framework to build a cryptographic system one 
property of direct importance is time reversibility. For a classical physical 
system of particle, e.g. a fluid, it is well known that, by modifying the velocity 
of each particle from v to —v, but keeping the same evolution equation, 
the full system retrace its own past. For several reason, this time- reversal 
operation is out of reach in real systems and also in standard molecular 
dynamics simulations. First, one needs to access all the degrees of freedom 
and, second, one must know to infinite precision the actual value of the 
velocity. Finally, in the case of a computer model, the calculation should not 
produce the slightest numerical error. 

However, in the case of a LGA model, reversing time is possible because 
both the particles and the dynamics are Boolean. Calculations are done with 
full accuracy, without truncations or errors. We illustrate this behavior in 
fig. 3. When no errors are introduced, it is possible to reverse the velocity of 
each particle by applying, at a chosen time t, the reverse operator R at each 
lattice site and we see the system return to its initial state after t repetition 
of propagation P and collision C. 

The operator R permutes the value of the at each site so as to place a 
particle of speed Vi the — Vi direction. Therefore 

R 2 = 1 (2) 

For instance, in the case of a lattice with four directions (North, East, South, 
and West), R would swap East with West and North with South. In this 

case 

R(n 1 ,n 2 , n 3 , n 4 ) = (n 3 , ra 4 , n 1 ,n 2 ) (3) 
where Vi = — v 3 v 2 = —V&. An interesting relation between P and R is 

PRP = R (4) 

which means that reversing allows the particle to propagate in the direction 
they came from and reach it with opposite velocity (which is the reason why 
PRP = R and not PRP = 1). 
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(a) 



(b) 



Figure 3: Time-reversibility in the so-called HPP LGA model. In (a), all 
particles are initially in the left compartment. Then they start expanding 
into the full space until an homogeneous situation is reached. Then, the 
velocity of all particles are inverted, as shown in (b) and the particle naturally 
trace back their way to the left compartment. 

The reversibility of the collision operator means 

CRC = R (5) 

because reversing the post collision configuration and applying the collision 
again give the pre-collision state, but with opposite velocities. 
With PRP = R and CRC = R one has 

{CP) r R(Pcy = (cpy- l cpRPC(PC) r - 1 
= (cpy^RiPcy- 1 = ...r 

which shows that performing r iterations, reversing the velocity and per- 
forming again r iterations take us back to the initial configuration, with all 
particles having an opposite velocity. This proves the time-reversibility of 
the process. 

The time-reversal operation is possible because the system has, in fact, 
not forgotten about its initial condition (in fig. 3 all particles are in the left 
compartment) even though it evolves to a state where this information seems 
completely lost. The information is actually not lost but diluted on all degrees 
of freedom. Therefore, if one perturbs just a single piece of this distributed 
information, the system loses its capability to evolve back to its past. This 
is shown in fig. 4 where a particle has been artificially added before reversing 
the time. Clearly, all the particles whose trajectory have been directly or 
indirectly affected by the presence of this extra particle are unable to return 
to their original position. This single modification creates an avalanche of 
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Figure 4: Effect of one error on the time-reversibility of the HPP LGA model. 
Compared with the situation of 3 (b), one extra particle is added on the 
homogeneous state. As a result, the particles are no longer able to return to 
their initial position, in the left compartment. 

perturbations which expands further and further as the number of time steps 
increased. This experiment illustrates how in a LGA paradigm we can play 
with the second principle of thermodynamics and reverse the arrow of time. 
At the macroscopic level, an H-theorem can be proved (provided that C is 
properly chosen), meaning the an entropy can be defined and that it always 
ends up growing. At the microscopic level, however, past and future are fully 
equivalent provided one has full knowledge of all degrees of freedom of the 
system. 

Another point worth mentioning about LGA is its intrinsic parallelism. 
All the particles move synchronously on the lattice and can be updated at 
the same time. The large degree of parallelism present in physical systems is 
naturally reproduced here and can be exploited to speedup dramatically the 
simulation. The collision operation is embarrassingly parallel whereas the 
propagation requires regular collective communications. 

3 A new cryptography algorithm 

In this section we make explicit the link we claimed to exist between classical 
symmetric cryptographic algorithms and a LGA-fluid. Then we propose a 
new cryptographic algorithm which exploits this physical analogy and we 
discuss the properties of such an approach. 
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3.1 Standard symmetric block-cipher 



Symmetric cryptography (as opposed to public-key cryptography) assumes 
that prior to the exchange of the encoded message, the sender and the re- 
cipient have agreed on a secret key which is used, on one side to encode the 
plain text message and, on the other side to decode it. 

Block-ciphers (as opposed to stream-ciphers) consider that the informa- 
tion that need to be transmitted can be divided into blocks of, say, N bits. 
Each block is encoded by the algorithm and sent before the next block is 
processed. 

Symmetric cryptosystems are usually described as a combination of two 
operations, termed diffusion and confusion [8, 9] These operations are re- 
peated in alternance and iteratively on the message M to be encoded. Each 
application of the diffusion and confusion step is called a round. Thus the 
full process consists in applying r rounds on message M. The larger the value 
of r, the more secure the encoding but the slower the encryption process. 

In practice, confusion is achieved by transforming each byte b of M ac- 
cording to a substitution box, or S-box, which is a given invertible, non-linear 
function. Diffusion corresponds to a deterministic shuffling of the bytes across 
the full message. 

Following Kirckhof principle, the secrecy of the process is not obtained by 
keeping hidden the algorithm but by parameterizing some of its aspects by a 
secret key K, made up of N' bits only known to the communicating persons. 
A common strategy is to XOR the key bits with the message bits before each 
round. In order to improve the security of the encoding process, the bits of 
the key are not reused identically over all rounds. They are transformed after 
each round, for instance by applying the same diffusion-confusion process as 
used for the message itself. These new sets of bits obtained at each round 
are called round keys. 

Note that, in some ciphers, the secret key K is used to parametrized the 
S'-box rather than being a secret, dynamic mask. 

Quite often, the key size N' matches the block size N. When the key 
is smaller than the message, padding is necessary. This can be a security 
issue if these additional bit are not carefully chosen. When the key is larger 
than the message, the extra key bits can be used to produce the successive 
round keys. When the entire message is composed of several blocks, the final 
round key of block % is usually used as the first round key for block % + 1. 
Once the recipient has received all blocks, decoding can start by inverting 
the ciphering process. For this purpose, the recipient must compute the final 
round key. 

The above general discussion applies to well known cryptographic algo- 
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rithms such as DES. AES, IDEA [7, 2] 



3.2 A physically based cryptography algorithm 

By comparing the material of sections 2 and 3.1, we clearly see how the LGA- 
fTuid bears structural similarities with the classical block-cipher engines. 

The plain text message (block) corresponds to the initial configuration of 
the physical particles (one bit is the presence or absence of a particle), the 
S'-box is the collision process and diffusion is obtained by our propagation. 
The r rounds represent a time evolution (iterations) and decoding reflects 
time-reversal invariance. The role of the key is a bit less obvious but it can 
be thought of as a controlled error which is added to break the possibility to 
reverse time. Somehow, the round keys reproduce the space-time evolution of 
a secret configuration of particles that interact with the "message" particles. 

However, some specificities of the physical process need to be emphasized 
with respect to the classical cryptographic algorithms. 

Space dimension Our particles move in a two-dimensional space. The 
dynamics can be generalized to higher dimensions. Usually, in cryptography 
the concept of space is absent, or little exploited (AES uses a matrix formu- 
lation which is more a mathematical construct than the desire to embed the 
process in the physical space). In physics, it is known that higher dimen- 
sional spaces allows for more efficient mixing mechanisms and reduces the 
space diameter. 

Fine grain diffusion In the LGA model, particles are mixed across space 
through the propagation process. Its purpose is to dilute the information 
carried by the initial configuration on all degrees of freedom. This "diffu- 
sion" 1 takes place at a bit level, whereas it is usually made a byte level in 
standard cryptography. 

Collision Like a S'-box, collision is a local operation since it is repeated 
independently over all lattice sites. It can be implemented as a lookup table 
and acts upon z bits where z is the lattice coordination number. Thus, 
depending on the lattice topology, collision acts on pieces of information 
that can be smaller, larger or equal to a byte. In the discrete-fluid, C is 
reversible, that is the same function is used forward in time or backward in 
time. This is a significant simplification when hardware implementation is 
considered. On the other hand, in a physical model, the collision C may 

1 In physics, diffusion refers to a random walk, which is not what our propagation does 
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have some undesired properties with respect to cryptography. In our fluid 
example, the number of particles is conserved by C (because Nature does). 
These conservations laws must be removed when devising a cryptographic 
scheme. 

Scalability and parallelism The LGA fluid is composed of the juxtapo- 
sition of many identical units: the lattice cell. In addition, these lattice cells 
are locally connected according to a simple topology. Therefore, system of 
arbitrary size can be constructed with the same design principle and same 
simplicity. The locality of the collision and the regularity of the diffusion 
allows for massive parallelism (one processing element at each lattice site, 
for any lattice size). 

The above discussion shows that classical mechanics, in the flavor of a 
discrete physical system (LGA or CA model), offers a natural framework for 
designing a cryptographic device. It actually resembles the classical designs 
obtained from mathematical principles but has some interesting features: 
cipher of any size can be produced with the same design principle, in which 
parallelism is ensured. Moreover, simplicity, regularity and adequation of the 
2D physical space with the spatial layout of integrated circuits allows fast 
hardware implementation. 

3.3 A specific instance of the algorithm 

We can now define more precisely an instance of a cryptographic algorithm 
based on the physics analogy. We consider a 2D square lattice, with z = 8 
links (up, left, right, down, plus the four diagonals) as illustrated in fig. 5. In 
the LGA jargon, this topology is often referred to as D2Q8 (2 Dimensions, 
8 Quantities). The lattice is a square of size yjN/z x tJn/z, with periodic 
boundaries. Each site contains z = 8 bits (i.e. a byte) and the full lattice 
can encrypt a block M of N bits. The propagation P and reverse operator 
R are as described in section 2: each of the eight bits of a given site is moved 
to one of the eight neighbors of that site and R swaps the bits traveling in 
opposite direction. 

In order to build a collision which is reversible (i.e CRC = R), but has 
no other symmetries, we first build a random, self-inverse function C such 
that C'C = 1. Then, clearly, C = C'R is reversible because 

CRC = C' RRC' R = C'C'R = R 

To produce C one has to define the image j = C'(i) for all i — 0,1, ...2 Z — 1. 
We start with i — and choose j at random. Then we immediately set 
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Figure 5: The 2D lattice used in the Crystal algorithm. Here, a system 
of size 8 x 8 is shown. Each gray box contain 8 bits so that the full block 
has 512 bits. Here we assume a periodic topology: left and right borders are 
wrap around, as well as the upper and lower ones. 
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C'(j) = i to ensure idempotency. We then proceeds similarly with the next i 
for which the image or pre-image has not yet been computed. The operator 
C obtained in this way can be further tested if needed to ensure that has 
no accidental undesired features. The function C = C'R is then published 
as part of the algorithm 

The secret key K has N' bits. We assume that N' < N. If N' < N 
some padding is needed. We produce the N — N' remaining bits by applying 
for instance the Kirkpatrick-Stoll procedure [5] to build random bit with 
density 1/2 out of an initial set: hi = &£_250 © ^-103 which can always be 
applied if N' > 250. Note that lagged Fibonacci method [4] can also be used: 
be = b e - 55 © &£_24- 

Then the round key at iteration m is obtained by successive application 
of PC over the previous round key at iteration to — 1. Finally the round keys 
are combined with the iterated message with an XOR. 

In order to ensure that the encryption and decryption are completely 
identical, it is convenient to start the process by a reverse and a propagation 
steps. This gives the following algorithm (whose structure remain the same 
even with a different topology) 

algorithm Crystal (M,K) 
reverse (M) , reverse (K) 
propagation(M) , propagation (K) 
repeat r times 
M=M xor K 

collision(M) , collision(K) 
propagation(M) , propagation(K) 
end repeat 
M=M xor K 
return M, K 
end algorithm 

where XOR is the same as © and K refers to the padded key. In appendix A, 
it is demonstrated that this scheme is reversible, namely that if (M', K') = 
Crystal(M,fr) then (M, K) = Crystal (M', K'). 

The appropriate value of the number of round r is discussed later. 

4 Properties 

In this section we derive some important properties of our cryptographic 
algorithm. The results will show that we can encrypt large blocks of data by 
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taking a large enough lattice. We will show that this process increases both 
throughput and security. 

4.1 Hamming distance 

First we consider the number r of iterations needed to make two initial mes- 
sages Mi and M 2 as different as possible from each other. We assume that 
messages Mi and M2 are identical except for one bit. The speed at which 
these two initial conditions diverge from each other reflects the discrete Lya- 
pounov exponent of the dynamics. The question is to determine how many 
steps are necessary so that the single bit error has "polluted" the full system. 
From that point on, all degrees of freedom have been informed that the two 
message actually differ. 

Since the key evolves similarly in the two messages, it is irrelevant and 
can be omitted from the discussion. After each collision-propagation step, 
information propagates away from the initial source of error. For the D2Q8 
topology we discuss here, it is easy to show that the number of sites that can 
be reached from an initial lattice position grows with the number of round r 
as a square shaped region A(r) of side 2r + 1 (and of area (2r + l) 2 ). This 
relation simply reflects that, at round r + 1, all lattice sites that are bordering 
A(r) according to the D2Q8 topology will belong to A(r + 1). 

Thus, when (2r + l) 2 = N/z, all the sites are informed. This value of 
r = (1/2) •y / 'N/ ' z corresponds to the diameter d of the lattice. 

Since there are z bits per sites and two random patterns differ on average 
by half of their bits, the Hamming distance between Mi and M 2 after r steps 
is expected to ideally evolve as 

H=\z{2r + lf (6) 

Figure 6 shows the evolution of the normalized Hamming distance h(r, Mi, M 2 ) = 
H/N in the case of our algorithm. We observe that, essentially, a number 
r = y/2d is needed to reach the plateau h — 1/2 where each bit of the 
two configurations differ randomly. Thus, fluctuations of magnitude 1 / 
around the value 1/2 are expected. The speed at which the h = 1/2 plateau 
is reached is less than predicted by eq. 6 because after a collision (or substitu- 
tion), only about z/2 bits differ from the reference configuration. As shown 
in fig. 7, the error thus propagates as a disk and not a square. The diameter 
of that error-disk grows on average by one lattice site at each iteration. Thus, 
during the first r = ^Jn/z/2, H behaves as 

H = \^ (7) 
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nx=64, 100 configurations nx=4, 100 configurations 




(a) (b) 

Figure 6: Evolution of the Hamming distance between two messages initially 
differing only by one bit. In (a) we have N = 64 x 64 x 8 = 32768 bits and 
in (b) N = 4 x 4 = 128 bits. Comparison with the ideal curve (eq. 6 is given 
with the doted parabola. The solid line parabola is the theoretical estimate 
of eq. 7). Finally, the vertical line show the iteration at which, according to 
eq. 8, the plateau should be reached. 
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Figure 7: Snapshot of the error propagation region, after 16 and 32 iterations, 
in a system of size 64 x 64. The non-blank regions indicates where the two 
configurations differ. The darker the gray, the more are the bits that differ. 
The dashed-line disks have radius 16 and 32, respectively; thus, the error 
propagates at speed one for this topology. 

When the error disk has reach the boundary of the lattice, the Hamming 
distance grows slower. A few more steps are needed to reach the corner of 
the domain. Assuming that the radius of the disk keep growing at speed 1, 
the total number of iterations is equal to half the length of the diagonal, i.e. 



This is the minimal number of rounds needed to mix the information all over 
the system. We will argue below that more rounds may be needed to ensure 
more security against crypt analysis. Thus we write r = ad where a is some 
constant. In the case of a D2Q8 lattice, we obtain 



Let us assume that the number of rounds is r = ayN/z, as discussed in the 
previous section. If the process is fully parallelized, propagation and collision 
take a constant time, t pc . Then the time T needed to encrypt the N bits of 
the message is 
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(9) 



4.2 Throughput and security 





(10) 
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Therefore, the encryption throughput W is 




N _ v/i 



Vn 



(ii) 



T (xtp C 



Thus, when large data blocks are encrypted and full parallelism is imple- 
mented, the throughput increases, even though the number of round also 
increases. However, the number of round increases slower than the data size. 

On the other hand, when the same N bits are split in several smaller 
blocks (as is the case with standard cryptosystems) , the total throughput is 
that obtained with one block and does not increase. On large enough data 
set, our approach, with full parallelism, will always give a better performance. 
Estimate shows that for N = 2048, a throughput of 10 Gb/s is expected with 
a FPGA implementation. Note that, the hardware simplicity of our cipher 
makes it possible to obtain high throughput even on small data sets. 

Security is also improved by our approach. Indeed, it is well admitted 
that the main factor impacting security in a symmetric block cipher is the 
number of rounds. In our case, the number of round increases as iV grows, 
making the cipher more resistant to cryptanalytic attacks. Thus security is 
improved at the same time as throughput increases. 

In the next section we give an estimate of the difficulty to break our 
algorithm when differential cryptanalysis is used. 

5 Differential cryptanalysis 
5.1 Introduction 

As already mentioned, a cryptosystem is a recipe to transform of a plain text 
message M of N bits into another message M', also containing N bits. Since 
the process must be invertible in order to achieve decryption, the transfor- 
mation must be a permutation from the set A4 N of all TV-bit messages into 
itself. The secret key K can then be seen as parameter which selects one of 
the possible permutation. 

With N bits, there are 2 N possible messages and (2^)! possible permu- 
tations. This is by far too large a number to use all possible permutations. 
That would require very long keys (with \og 2 (2 N \) « (N — 1)2 N bits) to index 
all of them. 

Therefore, in practice, a cryptosystem is a subset of the possible permu- 
tations from M. N — > Ai N , with a well specified indexing scheme to properly 
associate each key with each accessible permutation. Typically, K has also 
N bits and can index 2 N permutations only. Obviously the relation between 
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the key and the permutation is not explicit and is hidden by the algorithm 
which tells how to compute M 1 from M and K. 

The task of cryptanalysis is to obtain information on the secret key K 
from the knowledge of some pairs (M, M'). In the case where all (2^)! 
permutations are possible, the knowledge of (M, M') gives little information 
on the chosen permutation, hence on the value of K. Indeed, if (M, M') 
belongs to the secret permutation, there are still (2^ — 1)! permutations to 
search for. 

The other extreme is given by the following very simple cryptosystem: 
M' = M@K. This is a rather trivial indexing scheme since K is immediately 
determined by (M, M') as K = M © W . 

From a combinatorial point of view, the situation is not very different 
with practical cryptosystems. With 2^ possible permutations of a set of 
2 N messages, a pair (M, M') is likely to belongs to only one permutation. 
Thus, in principle, (M, M') contains a lot of information on K. However, this 
information is usually quite difficult to obtain explicitly because the relation 
K = K(M, M') is expected to be quite intricate. 

The goal of differential cryptanalysis is to obtain information on the key 
K by considering how two plain text messages Mi and M 2 get encrypted 
into M[ and M' 2 . Let us know consider such an attack in the case of our 
cryptosystem. 

5.2 Complexity to break the Crystal algorithm 

With M- m ^ and denoting the state of the messages and the key after 

m rounds, the algorithm Crystal gives 

M ( t ] = PC (M/ m - 1} © K<- m -V) (12) 

for i = 1,2. By XORing the above relation for i = 1 and i = 2 and applying 
inverse propagation, we obtain 

P- 1 [M[ m) © M 2 (m) ) = C (M{ m_1) © K^) © C (M 2 (m - 1} © K {m -^) (13) 

It is now convenient to introduced J 7 " 1 , an inverse XOR function associated 
with C. Suppose that 

b = C(ai) © C(a 2 ) 

for some known 2-bit value b and unknown 2-bit values a\ and a 2 . The 
question is to know which pairs a x and a 2 can possibly produce the given 
b. More specifically and for reasons that will become clear in a moment, we 
want to know the value of a± © a 2 compatible with b. So we write 

Oi © a 2 G ^ r_1 (6) iff b = C(ai) © C(a 2 ) (14) 
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For a given collision operator C, T~ x can be computed by exhaustive search. 
In practice one has to generate all possible values of a = a\ © a 2 and compute 
what b = C(cii) © C(a 2 ) it produces. Since we consider z-bit values, there 
are 2 Z ways to choose a and 2 Z possible values for b. Therefore, can 
be represented as a 2 Z x 2 Z matrix. A zero element at position (a, b) in 
this matrix means that the corresponding b cannot be produced with the 
corresponding a. Non-zero elements are set to the number of pairs (01,02) 
such that C(ai) © C(a 2 ) = b and a\ © a 2 = a. Indeed, a given b can be 
obtained several times with the same a, since each a can be produced from 
2 Z combinations a x © a 2 , where a± is free and a 2 = a © a±. Therefore, the 
sum of each row of the matrix is 2 Z . 

Similarly, each column also sums up to 2 Z . To see it, assume that b is 
written as b± © b 2 where b 2 = b © b\. There are 2 Z such values of b\. As 
C is invertible, we can compute a\ = C -1 (&i) and a 2 = C~ l (b 2 ). This pair 
(ai,a 2 ) will satisfy that C(ai) © C(a 2 ) = b. Thus a given b is obtained 2 Z 
times, distributed over the values of a = a\ © a 2 . 

By normalizing each entry with 2 Z , one obtains a matrix whose rows and 
columns sum up to 1. An example for z = 4 and our procedure to build a 
random reversible C is given below. 
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For instance, the element 1/8 obtained for a = 4 and 6 = 7 means that 
if b = 7, the probability that a = 4 is 1/8. Clearly, if a = 0, it means that 
ai = a 2 and thus b = C(ai) © C(a 2 ) = 0, and conversely. 

We can now come back to eq. 13. With definition 14, we can rewrite it 

as 

F-'P- 1 (M[ m) © M 2 (m) ) = M[ m ~ l) © K^-V © M 2 (m " 1} © 

= M[ m ~ l) © M 2 (m_1) (15) 
By repeating this relation recursively, one obtain 

M{ 1} © M 2 (1) = (F-'P- 1 )'" 1 (M{ r) © M^) (16) 

where r is the number of rounds. Below we will show that if © M 2 ^ 
is known to the attacker, it is rather easy to obtain the secret key K. The 
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question we want to investigate first is how much computational effort is 
required to obtain © M^p from © which, by hypothesis, is 
known since attackers are supposed to have access to any pairs (M, M') they 
want. The estimate of the complexity of finding © is given below. 

(r) 

Since we assume that r > d, where d is the lattice diameter, M{ and 
M 2 (r) differ over all N/z lattice sites. In order to perform the backward 
scheme indicated in eq. 15, one has to find all possible pre-images by T~ x of 
P- 1 (M[ m) © M 2 (m) ). Empirically we observe that the number of pre-image 
of a given b is larger than 2 2 /4. Of course this depends on the choice of C, 
but this seems to be a minimal value. Therefore, for each lattice site, at 
least 2 2 ~ 2 values are possible for M{ r-1) © M 2 (r_1) . This requires to select 
{N/z)2 z ~ 2 candidates for M[ r " l) © M ; (r_1) 
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The same argument can be repeated r — d times. After that, we can 
quickly exclude some possibilities. Indeed, at this point, we know that the 
error has not been able to propagate up to the outer boundary of the lattice. 
For these lattice sites, M[ d ~^ © M^ -1 ^ must be zero. Thus the number of 
sites for which the exploration continues is (^Jn/z — 2) 2 . If we undo one more 
step, even more possibilities can be excluded and the pre-images of "only" 
N/z — 4) 2 sites must be investigated. Following this idea for the d — 1 



steps, one has to explore 3 2 x 5 2 x ... x (y N/z — l) 2 possible configurations 2 , 
each with 2 z /4 = 2 { - z ~' £) possible values. 

An inferior bound for this number is (see appendix B) 

(V x 5 2 x ... x (JnTz- l) 2 ) 2 Z - 2 > (d/2) 2d T' 2 = l - (^)¥- 2 

Thus, in total (undoing the rounds beyond and below the diameter) im- 
plies to investigate 

M > (N '/ > z y-d 2 (z-2)(r-d) ( N / Z y 2 z - A = ( N / z y 2 ^){r~d)+(z-t) ( 17 ) 

candidates for m[ 1] © M 2 (1) . 

This relation will be discuss in detail in section 5.4. Some values of Af as 
a function of iV and r are given in table 1. 



5.3 The last step to obtain the key 

In this section, we show how to compute the key from a possible candidate 
M[ l) © M 2 (1) . Since iWf } = PC{Mf ] © K^) we obtain 

C (m[ 0) © K®) © C (M 2 (0) © K<®) = P- 1 (m[ 1] © M 2 (1) ) (18) 
2 for a D2Q8 topology 
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Let us introduce 61 = C (m{ 0) © ) , 6 2 = C (M 2 (0) © and 6 = 

© M^). The quantity b is supposed to be known. 
We now investigate all possible values of bi = 0, 1, ...(2 2 — 1). For each of 
these 61, 6 2 can be computed as 

b 2 = b © b± 

Since C is invertible, we can compute from the definition of b\ 

m[ 0) @kw = c- l {h) 

Thus, from the definition of b 2 

M 2 (0) © = C-\b 2 ) = C-\h © b) 

Consequently, the initial key must satisfy 

K<® = C _1 (&i) © M{ 0) = C-^fe) © M 2 (0) (19) 

If C _1 (&i) © M[ 0) C-\b 2 ) © M 2 (0) , then another choice of 61 must be 
considered. Otherwise, the value of obtained from eq. 19 is a possible 
key value. 

The time complexity of this last step is thus 2 Z . 
5.4 Security and performance 

Let us now analyze in more detail eq. 17. We write N as N = 2 £ , where 
£ = log 2 N. For z = 8 = 2 3 , eq. 17 becomes 

J\f > 2^~ 3 ) r '2 (z ~ 2 )( r ~ d ) '2 2 ~ 4 = 2( £+3 ) r ~ 6ci+4 

where r > d and = (1/2)\Jn/z = 2^ -5 ^ 2 . Some values of iV, r and A/" are 
shown in table 1. Note that the size iV = 32 is only given to illustrate how 
r changes when the block size increases or decreases. Clearly, with 32-bit 
messages, all messages M could be generated to discover which one encrypts 
to a given, intercepted M' . In this case, no complex cryptanalysis would be 
necessary. 

Some remarks about the throughput are now in order. With a paral- 
lel hardware, the encryption time grows as the number of rounds but it is 
independent of N. Thus, throughput clearly benefits from large N, as the 
throughput is proportional to N and inversely proportional to r. In the ex- 
amples shown in table 1, we see that iV = 512 yields the best throughput 
while keeping about the same cryptanalytic complexity. 
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N I d r N 



2048 


11 


8 


13 


2 138 


512 


9 


4 


13 


2 136 


128 


7 


2 


14 


2 132 


32 


5 


1 


17 


2 134 



Table 1: The work M needed to break the cipher as a function of block size TV 
and number of round r. The quantities I and d are by definition £ = log 2 N 
and d = (l/2)yjN/z. The number of bits per site is z = 8. 




(a) (b) 

Figure 8: (a) Number of rounds r as a function of block size N, to keep a 
given security level S. Note that r must be larger than the diameter d. The 
limit r — d is shown by the dashed curve, (b) Throughput versus block size, 
for two given security levels and a parallel implementation. The clock tick is 
by definition the time required by one encryption round. 
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Figure 9: (a) Security S as a function of block size N, for r = 2d. (b) 
Throughput versus block size, for r = 2d, in the case of a parallel implemen- 
tation. The clock tick is time of one encryption round. 

This effect is mostly lost on a sequential hardware, since computing time 
goes as r x N and the throughput is the block size divided by the computing 
time. Thus, throughput is proportional to 1/r. Therefore, for a given value 
of M , since r increases as N decreases, larger sizes offer a better throughput. 
However, it may be argued that a software implementation of our algorithm 
on small system size can be significantly faster. For instance, for N = 32, four 
table lookups of 256 entries are enough on a 32-bit architecture to perform 
both the propagation and the collision on the entire system. For N = 128, 
48 table lookups become necessary to implement at once collision and propa- 
gation. Thus, for the same cryptanalytic complexity, blocks of 32 bits would 
provide a throughput about 10 times larger than blocks of 128 bits. 

In order to summarize the above discussion and to highlight the link 
between security and performance in a parallel implementation, let us 
define the security measure S as the logarithm of our estimate of M 



S = \og 2 M = (£ + 3)r-6d + A 



(20) 



We also define the throughput per clock tick, Q, as 



N _ 2 l 
r r 



(21) 
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where the clock tick is defined as the time needed to perform one round of 
encryption. 

In figure 8 (a), we show how r must change with respect to N, for a given 
security level S. We have from eq. 20 

S + Qd - 4 

r = ^T^ (22) 

so that, for each value of iV (i.e for the corresponding values of d and £), we 
can plot the value of r ensuring the security S. Note that, the constraint 
r > d must be satisfied. 

From eq. 22 we also obtain the isosecurity relation between Q and N. 
This is shown in figure 8 (b). We see that for a clock rate of 100 MHz, a 
throughput of 50Gb/s is achieved for a block of size N = 8192 bits. 

Finally, in figure 9 (a) we show how security S increases with iV when we 
take the number of round r as twice the diameter d. In figure 9 (b) we plot 
Q, the resulting throughput per clock cycle. 



6 conclusion 

We have shown that discrete physics offers a natural and inspiring frame- 
work to analyze symmetric cryptographic algorithms based of the diffusion- 
confusion paradigm. Physical concepts such as entropy, ergodicity, thermo- 
dynamic limits, mixing, Liapounov exponent, etc are alternative ways to 
describe and quantify cryptographic devices and security issues. 

The analogy with discrete physics provides us with simple design princi- 
ples to devise families of highly scalable encryption algorithms, as discussed 
in detail in the present paper and exemplified with the so-called Crystal 
algorithm. A key feature of our approach is that the encryption of large data 
blocks is possible with parallel hardware. It results in a higher throughput 
and a higher security. In a time where new applications develop and require 
the fast encryption of large volume of data, our approach is a very promising 
solution to several emerging technologies 

A Reversibility of the encryption-decryption 
algorithm 

We here prove that the decryption and encryption schemes are actually iden- 
tical. 
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The encryption algorithm consists of n rounds of adding the key to the 
message and performing a collision operation followed by a propagation step. 
The last round terminates with the addition of the current value of the key 
again. The key itself evolves simply with n rounds of collision-propagation 
operations. 

We propose to write the full algorithm as 

reverse (M) , reverse (K) 
propagation(M) , propagation (K) 
repeat n times 
M=M xor K 

collision(M) , collision(K) 

propagation(M) , propagation(K) 
end repeat 
M=M xor K 

As a consequence, the decryption will follow the exact same steps. We sketch 
the proof below. 

Let M and K be the initial message and key. After the first application 
of reverse and propagation operator, it is convenient to define 

M (0) = PRM K {0) = K (23) 

Let be the message after m rounds and the current key value 

at these m rounds. With P and C the propagation and collision operators, 
the above encryption rule reads 

M (m) = p C ( M (m-1) K (m-1)^ R ( m ) = p CK (m-l) ( 2 4) 

where m = 1,2, ...,n — 1. After the last round, we add the key one more time 
and we have 

M (n) = p C ( M (n-1) K (n-1)^ R {n) R (n) = p CK (n-l) ^5) 

In order to show that, by applying the same steps a second time, the 
message gets decrypted, we need to derive the following relations: 

PRM (n ' 1] = PC (PRM^ n) © PRK (n) ) © PRK (n - 1] (26) 

PRM^ m ~ l) = PCPRM (m) © PRK {m - 1] m = 1, n - 1 (27) 
PRK {m - 1] = PCPRK (m) m = l,...,n (28) 
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The proof of the above three relations is now given. From eq. 25 we 
obtain 

RM {n) © RK (n) = RPC (m ( "~ 1} © K {n - l) ) (29) 

where R is the reverse operator, which is linear: R(a © b) = Ra © Rb. By 
applying a propagation to both side of eq. 29 we obtain 

PRM (n) © PRK in) = RC (M {n - l) © K {n - l) ) (30) 

because P is linear and PRP = R. We can now apply C on both side and, 
since CRC = R, our equation becomes 

C (PRM {n) © PRK {n) ) = RM {n ~ l) © RK in - 1] (31) 

where, again we have used the linearity of R. This equation can be rewritten 

as 

RM( n -V = C (PRM {n) © PRK {n) ) © RK^-V (32) 
or, after applying P 

PRM {n - 1] = PC (PRM^ © PRK^) © PRK^-^ (33) 

This equation shows that PRM^" 1 ^ is obtained from PRM^ by the same 
expression as in 25. This proves equation 26. 

For the remaining m < n — 1, we have, from eq. 24 

PRM im) = PRPC (M^ m - l) © K^ l - l) ) PRK {m) = PRPCK^ 1 '^ 

(34) 

or 

PRM im) = RC (M (m - 1} © K^-V) PRK (m) = RCK {m - l) (35) 
which, by using CRC = R, can be further transformed into 

CPRM (m) = RM {m ' l) © RK( m -V CPRK im) = CRCK {m ~ l) (36) 
or, also 

PRM( m -V = PCPRM^^PRK^-^ PCPRKW = PRK^'V (37) 

These last equations prove eqs. (27) and (28). 

Now we can apply our encryption algorithm a second time on and 
and show that it gives back M and K. The result of the first reverse 
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and propagate operation gives PRM^> and PRK^ n \ Now let us write the 
result of the first iteration. Key addition plus collision and propagation yield 

PC (Pi?M (n) © PRK in) ) , PCPRK {n) 

Using relations (26) and (28), this reduces to 

PRM^-V © PRK^-V, PRK {n - 1] 

The next iteration starts by adding the current key, namely PRK^' 1 ^. 
Thus the message now reads 

PRM^-V 

Then follows a collision-propagation step on both the message and the key 

PCPRM^-V, PCPRK^-^ 
Equations (27) and (28) can now be applied for m = n — 1 and produce 

The next n — 2 rounds work similarly and yield 

PRM {0) © PRK (0 \ PRK<® 

The final key addition (after the n rounds) give 

PRM {0) = PRPRM = R 2 M = M 

This achieves the proof that the decryption is identical to the encryption 
since the original message M is recovered. 

B Calculation of an inferior bound of Q r 

Let us define 

Q r = 1 x 3 x 5... x (2r + 1) 
This quantity can be written as 

2 r r! 
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Thus, using Stirling formula, one has: 



\nQ r = ln(2r + 1)! - r In 2 - lnr! 

= (2r + 1) ln(2r + 1) - (2r + 1) - r In 2 - r lnr + r 

> 2r ln(2r) — 2r — 1 — r In 2 — r In r + r 
= 2r In 2 + 2r In r — r — 1 — r In 2 — r In r 
= rhi2 + rhir — r — 1 

= rhi2 + rhi(2r/2) -r- 1 

= r In 2 + r In 2 + r ln(r/2) — r — 1 

= r(21n2 - 1) +rm(r/2) - 1 

> rln(r/2) 

(38) 

Therefore 



and 




which is the quantity of interest to bound the amount of work for a differential 
cryptanalysis of our algorithm. 
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